Dual-Modal Transformer with Enhanced Inter- and Intra-Modality Interactions for Image Captioning
نویسندگان
چکیده
Image captioning is oriented towards describing an image with the best possible use of words that can provide a semantic, relatable meaning scenario inscribed. Different models be used to accomplish this arduous task depending on context and requirement what needs achieved. An encoder–decoder model which uses feature vectors as input encoder often marked one appropriate process. In proposed work, dual-modal transformer has been captures intra- inter-model interactions in simultaneous manner within attention block. The architecture quantitatively evaluated publicly available Microsoft Common Objects Context (MS COCO) dataset yielding Bilingual Evaluation Understudy (BLEU)-4 Score 85.01. efficacy Flickr 8k, 30k datasets MS COCO results for same compared analysed state-of-the-art methods. shows outperformed when conventional models, such model.
منابع مشابه
Image Captioning with Attention
In the past few years, neural networks have fueled dramatic advances in image classi cation. Emboldened, researchers are looking for more challenging applications for computer vision and arti cial intelligence systems. They seek not only to assign numerical labels to input data, but to describe the world in human terms. Image and video captioning is among the most popular applications in this t...
متن کاملEnd-to-End Dense Video Captioning with Masked Transformer
Dense video captioning aims to generate text descriptions for all events in an untrimmed video. This involves both detecting and describing events. Therefore, all previous methods on dense video captioning tackle this problem by building two models, i.e. an event proposal and a captioning model, for these two sub-problems. The models are either trained separately or in alternation. This prevent...
متن کاملInter- and Intra-Domain Routing Interactions for MANETs
When making use of inter-domain routing in a MANET environment, certain interactions are required between the Exterior Gateway Protocol (EGP) and the Interior Gateway Protocol (IGP, such as OLSR or AODV). Unlike the norm in conventional fixed networks, many MANET protocols assume that every node in a network is a router, and as such a different mechanism to traditional dual-IGP/EGP functionalit...
متن کاملImage Captioning with Sparse Lstm
Long Short-Term Memory (LSTM) is widely used to solve sequence modeling problems, for example, image captioning. We found the LSTM cells are heavily redundant. We adopt network pruning to reduce the redundancy of LSTM and introduce sparsity as new regularization to reduce overfitting. We can achieve better performance than the dense baseline while reducing the total number of parameters in LSTM...
متن کاملContrastive Learning for Image Captioning
Image captioning, a popular topic in computer vision, has achieved substantial progress in recent years. However, the distinctiveness of natural descriptions is often overlooked in previous work. It is closely related to the quality of captions, as distinctive captions are more likely to describe images with their unique aspects. In this work, we propose a new learning method, Contrastive Learn...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Applied sciences
سال: 2022
ISSN: ['2076-3417']
DOI: https://doi.org/10.3390/app12136733